__nbid__ = '0010'
__author__ = 'Robert Nikutta <robert.nikutta@noirlab.edu>, Stéphanie Juneau <stephanie.juneau@noirlab.edu>, Knut Olsen <knut.olsen@noirlab.edu>, David Nidever <david.nidever@noirlab.edu>'
__version__ = '20240603' # yyyymmdd
__datasets__ = ['smash_dr1']
__keywords__ = ['science example', 'dwarf galaxies']
Detecting the Hydra II dwarf galaxy in SMASH DR1¶
Robert Nikutta, Stéphanie Juneau, Knut Olsen, David Nidever, & Astro Data Lab Team
Table of contents¶
- Goals & notebook summary
- Disclaimer & attribution
- Imports & setup
- Query the SMASH DR1 catalog for blue stars (expected in metal-poor dwarfs)
- Spatial density plot
- Detect overdensities (differential Gaussian kernels)
- Identify peaks
- Inspect the image cutouts around the peaks (Simple Image Access)
- CM diagrams of all stars around the peaks
- Save your results to a local file
- Resources
Goals¶
We will use a dwarf galaxy detection algorithm using a spatial overdensity of blue stars to re-discover the Hydra II dwarf galaxy in the SMASH DR1 dataset. Hydra II (red circle in the map below) is one of ~60 dwarf galaxy satellites of the Milky Way.
Summary¶
Background Ultrafaint dwarf galaxies are crucial to understanding many aspects of the universe. For instance, they are dominated by dark matter; their localization in space can thus trace the large-scale structure of the dark matter distribution. Furthermore, dwarf galaxies are suspected to host intermediate-mass black holes (IMBH), which so far have eluded efforts to find them. IMBHs will naturally bridge the gap between the solar-mass black hole and super-massive blackholes that reside at the center of virtually every large galaxy.
Data retrieval We will retrieve data from Field 169 of the SMASH catalog (Nidever et al. (2017, AJ, 154, 199)) and look for overdensities of blue objects.
The required columns are RA, Dec, and the g, r, i magnitudes.
Detection We will convolve the spatial distribution of our dataset with a pair of Gaussian kernels and subtract the results, as done in e.g. Stanford et al. (2005, ApJ, 634, 2, L129) (galaxy clusters), or Koposov et al. (2008, ApJ, 686, 279) (MW satellites). This has the effect of convolving the spatial distribution with a Mexican hat filter, which is useful for detecting objects at a desired spatial scale.
Disclaimer & attribution¶
Disclaimers¶
Note that using the Astro Data Lab constitutes your agreement with our minimal Disclaimers.
Acknowledgments¶
If you use Astro Data Lab in your published research, please include the text in your paper's Acknowledgments section:
This research uses services or data provided by the Astro Data Lab, which is part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation.
If you use SPARCL jointly with the Astro Data Lab platform (via JupyterLab, command-line, or web interface) in your published research, please include this text below in your paper's Acknowledgments section:
This research uses services or data provided by the SPectra Analysis and Retrievable Catalog Lab (SPARCL) and the Astro Data Lab, which are both part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation.
In either case please cite the following papers:
Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, https://doi.org/10.1117/12.2057445
Astro Data Lab overview: Nikutta et al., "Data Lab - A Community Science Platform", Astronomy and Computing, 33, 2020, https://doi.org/10.1016/j.ascom.2020.100411
If you are referring to the Data Lab JupyterLab / Jupyter Notebooks, cite:
- Juneau et al., "Jupyter-Enabled Astrophysical Analysis Using Data-Proximate Computing Platforms", CiSE, 23, 15, 2021, https://doi.org/10.1109/MCSE.2021.3057097
If publishing in a AAS journal, also add the keyword: \facility{Astro Data Lab}
And if you are using SPARCL, please also add \software{SPARCL} and cite:
- Juneau et al., "SPARCL: SPectra Analysis and Retrievable Catalog Lab", Conference Proceedings for ADASS XXXIII, 2024 https://doi.org/10.48550/arXiv.2401.05576
The NOIRLab Library maintains lists of proper acknowledgments to use when publishing papers using the Lab's facilities, data, or services.
Imports and setup¶
# std lib
import warnings
warnings.filterwarnings('ignore') # to suppress some astropy depracation warnings
# 3rd party
import numpy as np
from astropy import utils, io, convolution, stats
from astropy.visualization import make_lupton_rgb
from photutils import find_peaks
from pyvo.dal import sia
import pylab as plt
%matplotlib inline
# Data Lab
from dl import authClient as ac, queryClient as qc
from dl.helpers.utils import convert
# plots default setup
plt.rcParams['font.size'] = 14
Query the SMASH DR1 catalog for blue stars (expected in metal-poor dwarfs)¶
We will query the averaged photometry table from the SMASH catalog and select field #169. We'll issue two queries, one in which we select blue (-0.4 < g-r < 0.4) star-like objects (|sharp|<0.5), and the other without those constraints, to be used for comparison. For both queries, we will avoid photometry taken only with short exposures (depthflag>1), exclude objects with fewer than 4 detections, and apply a magnitude constraint of 9 < g < 25, so as to avoid excessive contamination.
Construct the query strings¶
field = 169 # SMASH field number to query
# Create a query of all sources in the Field, for comparison
query_all =\
"""SELECT ra,dec,gmag,rmag,imag
FROM smash_dr1.object
WHERE fieldid = '{:d}' AND
depthflag > 1 AND
ndetr > 3 AND ndetg > 3 AND
gmag BETWEEN 9 AND 25 AND random_id<10""".format(field)
# Create the query string; SQL keyword capitalized for clarity
# depth > 1 = no short exposures please
# ndetr, ndetg > 3 = more than 3 detections in r & g bands
# abs(sharp) < 0.5 = avoid broad objects
query =\
"""SELECT ra,dec,gmag,rmag,imag
FROM smash_dr1.object
WHERE fieldid = '{:d}' AND
depthflag > 1 AND
ndetr > 3 AND ndetg > 3 AND
abs(sharp) < 0.5 AND
gmag BETWEEN 9 AND 25 AND
(gmag-rmag) BETWEEN -0.4 AND 0.4""".format(field)
Submit the query¶
Running the query in synchroneous mode is very easy.
df_all = qc.query(sql=query_all,fmt='pandas') # response is by default a CSV-formatted string
print("Number of objects:", len(df_all))
print(df_all.head())
Number of objects: 117455
ra dec gmag rmag imag
0 184.876934 -32.868654 24.867119 24.545872 24.546236
1 184.892606 -32.874418 24.309147 25.094023 24.796555
2 184.894265 -32.873554 24.608845 24.403374 24.662037
3 184.895268 -32.872408 24.590645 24.494490 24.497213
4 184.896137 -32.871733 24.677746 24.908146 24.395891
R = qc.query(sql=query,fmt='pandas') # response is by default a CSV-formatted string
print("Number of objects:", len(R))
print(R.head())
Number of objects: 104974
ra dec gmag rmag imag
0 184.876674 -32.873511 24.746605 24.838743 24.185682
1 184.876606 -32.870861 24.156397 24.068817 23.074945
2 184.875853 -32.867214 24.084047 24.028060 23.630045
3 184.877080 -32.869780 24.482061 24.446104 23.858896
4 184.878492 -32.866905 24.678942 24.714973 24.624266
Spatial density plot¶
You might spot some overdensities already, particularly in the result with blue star-like objects.
fig, (ax1, ax2) = plt.subplots(1,2,figsize=(14,7), sharey=True, constrained_layout=True)
h1 = ax1.hexbin(df_all['ra'], df_all['dec'],gridsize=200)
ax1.set_xlabel('RA')
ax1.set_ylabel('Dec')
ax1.set_title('~100000 random objects')
h2 = ax2.hexbin(R['ra'], R['dec'],gridsize=200)
ax2.set_xlabel('RA')
ax2.set_title('~100000 blue stellar objects')
fig.colorbar(h1,label='number of objects per spatial bin');
Detect overdensities (differential Gaussian kernels)¶
Here we define the dwarf filter as a differential convolution of a two-dimensional image using two Gaussian kernels; this has the effect of convolution with a Mexican hat filter. The default kernel shapes look for objects on the scale of a few arcmin. The output includes a clipped array of the convolved spatial distribution, which we will use for peak detection.
def dwarf_filter (ra,dec,fwhm_small=2.0,fwhm_big=20):
"""Differential convolution with 2D Gaussian kernels.
Based on Koposov et al. (2008).
Code by Ken Mighell and Mike Fitzpatrick.
Minor edits by RN.
Parameters
----------
ra, dec : float or array
RA & Dec in degrees.
fwhm_small, fwhm_big : float
Full-width half maximum sizes of the small and big Gaussian kernels
to use in convolution, in arcminutes.
"""
x, y = ra, dec
print("Computing differential convolution .... ",)
# Information about declination (y) [degrees]
ymean = (y.min() + y.max()) / 2.0
ydiff_arcmin = (y.max() - y.min()) * 60.0 # convert from degrees to arcmin
# Information about right ascension (x) [degrees in time]:
xdiff = x.max() - x.min() # angular separation [degrees (time)]
xmean = (x.min() + x.max()) / 2.0
# convert from degrees in time to separation in angular degrees:
xdiff_angular = (x.max() - x.min()) * np.cos(ymean*(np.pi/180.0))
# convert from degress to arcmin
xdiff_angular_arcmin = xdiff_angular * 60.0
# Get the number of one-arcmin pixels in the X and Y directions:
nx = np.rint(xdiff_angular_arcmin).astype('int')
ny = np.rint(ydiff_arcmin).astype('int')
# Create a two-dimensional histogram of the raw counts:
Counts, xedges, yedges = np.histogram2d (x, y, (nx,ny) )
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
raw_hist = np.rot90(Counts).copy() # hack around Pythonic weirdness
# Make the small and big Gaussian kernels with a standard deviation
# of the given FWHM in arcmin^2 pixels.
kernel_small = convolution.Gaussian2DKernel(fwhm_small/2.35,factor=1)
kernel_big = convolution.Gaussian2DKernel(fwhm_big/2.35,factor=1)
# Compute the differential convolution kernels.
conv_big = convolution.convolve(raw_hist, kernel_big)
conv_small = convolution.convolve(raw_hist, kernel_small)
conv_delta = conv_small - conv_big
delta = conv_delta.copy()
# Compute statistics and the floor
mean = np.mean(delta, dtype='float64')
sigma = np.std(delta, dtype='float64')
sigmaRaw = np.std(raw_hist,dtype='float64')
median = np.median(delta) # not used
floor = mean
print('dwarf_filter: mean = {:g} sigma = {:g} sigmaRaw = {:g}'.format(mean, sigma, sigmaRaw))
clipped = delta.copy()
clipped[delta < floor] = floor
# Return the computed fields.
return raw_hist, extent, delta, clipped, sigma
Run the dwarf filter¶
We'll use the default convolution kernels of 2 and 20 arcminutes in size.
small_k, big_k = 2., 20. # kernel sizes in arcminutes
raw, extent, delta, clipped, dsigma = dwarf_filter(R['ra'],R['dec'],fwhm_small=small_k,fwhm_big=big_k)
Computing differential convolution .... dwarf_filter: mean = 0.0890555 sigma = 1.79455 sigmaRaw = 5.33591
Plot the convolved 2D histogram¶
fig, ax = plt.subplots(figsize=(7,6))
im = plt.imshow(clipped)
plt.xlabel('pixel')
plt.ylabel('pixel')
plt.colorbar(label='relative spatial density after convolution');
Some peaks are visible, let's locate them automatically...
Identify peaks¶
We'll use the photutils package to identify 10-sigma peaks in the clipped filtered image.
# find peaks
mean, median, std = stats.sigma_clipped_stats(clipped,sigma=3.0,maxiters=5)
tbl = find_peaks(clipped,median+10,box_size=small_k*2)
# add ra & dec positions of peaks found
a, b = extent[:2]
xvec = np.arange(a,b,(b-a)/clipped.shape[1])
a, b = extent[2:]
yvec = np.arange(a,b,(b-a)/clipped.shape[0])
tbl['ra'] = xvec[tbl['x_peak']]
tbl['dec'] = yvec[-tbl['y_peak']-1]
print(tbl)
x_peak y_peak peak_value ra dec
------ ------ ------------------ ------------------ -------------------
86 89 11.31152453793508 185.41055953270728 -31.976003447411664
34 100 11.512626647783886 184.39348041275326 -32.15954180336243
Show the identified density peaks¶
ecs = ['w','y'] # colors of box frames
ax.scatter(tbl['x_peak'],tbl['y_peak'],marker='s',s=tbl['peak_value']*40,c='none',edgecolors=ecs,lw=3) # keeps writing to previous ax
fig # repeats (the updated) figure
Inspect the image cutouts around the peaks¶
Simple Image Access service¶
Data Lab comes with batteries included. Image cutout and download services are built in.
We'll just write two little functions:
- one to download the deepest stacked images found in the given bands at a given position in the sky
- and a function to plot several images side-by-side.
# set up SIA
DEF_ACCESS_URL = "https://datalab.noirlab.edu/sia/coadd_all"
svc = sia.SIAService(DEF_ACCESS_URL)
# a little func to download the deepest stacked images
def download_deepest_images(ra,dec,fov=0.1,bands=list('gri')):
imgTable = svc.search((ra,dec), (fov/np.cos(dec*np.pi/180), fov), verbosity=2).to_table()
print("The full image list contains {:d} entries.".format(len(imgTable)))
sel0 = (imgTable['proctype'] == 'Stack') & (imgTable['prodtype'] == 'image') # basic selection
images = []
for band in bands:
print("Band {:s}: ".format(band)) #, end='')
sel = sel0 & (imgTable['obs_bandpass'] == band) # add 'band' to selection
Table = imgTable[sel] # select
row = Table[np.argmax(Table['exptime'].data.data.astype('float'))] # pick image with longest exposure time
url = row['access_url'] # get the download URL
print('downloading deepest stacked image...')
img = io.fits.getdata(utils.data.download_file(url,cache=True,show_progress=False,timeout=120)) # .decode() b/c in Python 3 url is of "byte" type and getdata() expects "string" type
images.append(img)
print("Downloaded {:d} images.".format(len(images)))
return images
# multi panel image plotter
def plot_images(images,titles=list('gri'),cmap=plt.cm.gray_r):
fig = plt.figure(figsize=(8,10),tight_layout=True)
grid = plt.GridSpec(4, 3)
ax1 = fig.add_subplot(grid[0, 0])
ax2 = fig.add_subplot(grid[0, 1])
ax3 = fig.add_subplot(grid[0, 2])
ax4 = fig.add_subplot(grid[1:, 0:])
for j,ax,img in zip(np.arange(4),[ax1,ax2,ax3,ax4],images):
ax.imshow(img,origin='lower',interpolation='none',cmap=cmap,norm=plt.mpl.colors.PowerNorm(0.1))
ax.axis('off')
ax.set_title('{:s}'.format(titles[j]))
Get images for the "left yellow" box¶
Download the deepest stacked image cutouts (in 3 bands) around the position of the peak marked with a yellow box. Depending on network speed and system load, this can take a few seconds. Also create a 3-band false-color composite.
bands = list('gri')
idx = 1
print(tbl['ra'][idx], tbl['dec'][idx])
images = download_deepest_images(tbl['ra'][idx], tbl['dec'][idx], fov=0.1, bands=bands) # FOV in deg
184.39348041275326 -32.15954180336243 The full image list contains 401 entries. Band g: downloading deepest stacked image... Band r: downloading deepest stacked image... Band i: downloading deepest stacked image... Downloaded 3 images.
Plot the images, plus a false-color 3-band image:
images = [im-np.median(im) for im in images] # subtract median from all images for better scaling
images += [make_lupton_rgb(*images[::-1],stretch=30)] # add a 3-color composite image
plot_images(images,titles=bands+['3-band image'])
Looks like a galaxy cluster!
Now the "white center box" object¶
idx = 0
images = download_deepest_images(tbl['ra'][idx], tbl['dec'][idx], fov=0.1, bands=bands) # FOV in deg
images = [im-np.median(im) for im in images] # subtract median from all images for better scaling
images += [make_lupton_rgb(*images[::-1],stretch=30)] # add a 3-color composite image
plot_images(images,titles=bands+['3-band image'])
The full image list contains 403 entries. Band g: downloading deepest stacked image... Band r: downloading deepest stacked image... Band i: downloading deepest stacked image... Downloaded 3 images.
Definitely not a galaxy cluster, but a stellar overdensity
Query the database for stars near peaks¶
Get photometry of stars within a 5 arcmin radius of each density peak.
def makequery(ra0,dec0,radius0=5./60.,field=169):
query = """
SELECT ra,dec,gmag,rmag,imag FROM smash_dr1.object
WHERE fieldid = {:d}
AND depthflag > 1
AND abs(sharp) < 0.5
AND gmag BETWEEN 9 AND 25
AND q3c_radial_query(ra,dec,{:f},{:f},{:f})
""".format(field,ra0,dec0,radius0)
return query
query0 = makequery(tbl['ra'][0],tbl['dec'][0]) # center ra & dec
R0 = qc.query(sql=query0, fmt='pandas') # using sql argument instead of the default adql
print(R0.head()) # a Pandas method
ra dec gmag rmag imag 0 185.341390 -32.034610 18.762146 17.609804 17.174831 1 185.340696 -32.033947 24.666931 NaN NaN 2 185.352840 -32.038977 24.814121 24.343794 23.955757 3 185.345295 -32.033874 24.745580 24.538908 24.439932 4 185.348514 -32.033831 24.887840 24.834295 24.795055
Compute color¶
R0['g_r'] = R0['gmag'] - R0['rmag']
print( R0.head())
ra dec gmag rmag imag g_r 0 185.341390 -32.034610 18.762146 17.609804 17.174831 1.152342 1 185.340696 -32.033947 24.666931 NaN NaN NaN 2 185.352840 -32.038977 24.814121 24.343794 23.955757 0.470327 3 185.345295 -32.033874 24.745580 24.538908 24.439932 0.206672 4 185.348514 -32.033831 24.887840 24.834295 24.795055 0.053545
Now the other peak¶
query1 = makequery(tbl['ra'][1],tbl['dec'][1])
R1 = qc.query(sql=query1,fmt='pandas')
R1['g_r'] = R1['gmag'] - R1['rmag'] # compute color
print(R1.tail()) # another Pandas methodb
ra dec gmag rmag imag g_r 2170 184.471635 -32.119314 24.889196 24.793587 24.467518 0.095609 2171 184.481076 -32.122957 22.749125 21.331306 20.392641 1.417819 2172 184.470299 -32.112379 23.647750 22.085228 21.719313 1.562522 2173 184.470477 -32.112378 23.919018 22.351788 21.632957 1.567230 2174 184.467250 -32.106465 24.091139 23.340645 22.943100 0.750494
Color-magnitude diagrams of the peaks¶
def plotpanel(axid,x,y,title='',xlim=(-1,2),ylim=(25.2,14)):
ax = fig.add_subplot(axid)
ax.scatter(x,y,marker='.',s=10, alpha=0.8)
ax.set_xlabel(x.name)
ax.set_ylabel(y.name)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title(title)
fig = plt.figure(figsize=(12,5.5))
plotpanel(121,R0['g_r'],R0['gmag'],'white box, {:d} objects'.format(len(R0)))
plotpanel(122,R1['g_r'],R1['gmag'],'yellow box, {:d} objects'.format(len(R1)))
Compare w/ Vivas+2016¶

Save your photmetry table¶
If you wish, you can now save your photometry table for Hydra II to a local file and take it with you.
outfile = 'hydra2.csv'
R0.to_csv(outfile,index=False)
hydra2.csv <-- Download the file as you please.
Some ideas for improvement¶
Find overdensities elsewhere (everywhere!)
optimize kernel sizes, computation, proper wavelets
Some resources¶
Nidever et al. (2017) "SMASH - Survey of the MAgellanic Stellar History": http://adsabs.harvard.edu/abs/2017AJ....154..199N
Stanford et al. (2005, ApJ, 634, 2, L129) "An IR-selected Galaxy Cluster at z = 1.41": http://adsabs.harvard.edu/abs/2005ApJ...634L.129S
Koposov et al. (2008, ApJ, 686, 279) "The Luminosity Function of the Milky Way Satellites": http://adsabs.harvard.edu/abs/2008ApJ...686..279K
Martin et al. (2015, ApJ, 804, 5) "Hydra II: A Faint and Compact Milky Way Dwarf Galaxy Found in the Survey of the Magellanic Stellar History": http://adsabs.harvard.edu/abs/2015ApJ...804L...5M